Using Unknown Word Techniques to Learn Known Words

نویسندگان

  • Kostadin Cholakov
  • Gertjan van Noord
چکیده

Unknown words are a hindrance to the performance of hand-crafted computational grammars of natural language. However, words with incomplete and incorrect lexical entries pose an even bigger problem because they can be the cause of a parsing failure despite being listed in the lexicon of the grammar. Such lexical entries are hard to detect and even harder to correct. We employ an error miner to pinpoint words with problematic lexical entries. An automated lexical acquisition technique is then used to learn new entries for those words which allows the grammar to parse previously uncovered sentences successfully. We test our method on a large-scale grammar of Dutch and a set of sentences for which this grammar fails to produce a parse. The application of the method enables the grammar to cover 83.76% of those sentences with an accuracy of 86.15%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

earning evin night

We develop techniques for learning the meanings of unknown words in context. Working within a com-positional semantics framework, we write down equations in which a sentence's meaning is some combination function of the meaning of its words. When one of the words is unknown, we ask for a paraphrase of the sentence. We then compute the meaning of the unknown word by inverting parts of the semant...

متن کامل

A Probabilistic Model for Semantic Word Vectors

Vector representations of words capture relationships in words’ functions and meanings. Many existing techniques for inducing such representations from data use a pipeline of hand-coded processing techniques. Neural language models offer principled techniques to learn word vectors using a probabilistic modeling approach. However, learning word vectors via language modeling produces representati...

متن کامل

Co-learning of Word Representations and Morpheme Representations

The techniques of using neural networks to learn distributed word representations (i.e., word embeddings) have been used to solve a variety of natural language processing tasks. The recently proposed methods, such as CBOW and Skip-gram, have demonstrated their effectiveness in learning word embeddings based on context information such that the obtained word embeddings can capture both semantic ...

متن کامل

The effect of teaching the etymology of words to learn and reinforce vocabulary by Iranian children

This study was an attempt to investigate the effect of teaching morphemes on young EFLlearners. To this end, 23 young EFL learners in two different classes who studied at intermediatelevel were selected to participate in the present study. The participants were divided into twogroups. One of the groups was treated as experimental and the other one was considered as thecontrol group. Furthermore...

متن کامل

Subword-based Automatic Lexicon Learning for ASR

We present a framework for learning a pronunciation lexicon for an Automatic Speech Recognition (ASR) system from multiple utterances of the same training words, where the lexical identities of the words are unknown. Instead of only trying to learn pronunciations for known words we go one step further and try to learn both spelling and pronunciation in a joint optimization. Decoding based on li...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010